Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: yadfi Date: Nov 3, 2009 6:14am
Forum: texts Subject: Re: Google rejects?

I am really ONLY talking about Google scans uploaded here. I am one of those amateurs who scans vast numbers of books and I hugely appreciate the work done by others. My complaint is exclusively a Google one and concerns stuff uploaded by user tpb. The most recent scan I had a look at was Lovelich's Merlin (http://www.archive.org/details/merlinamiddleen00lovegoog) which has one very odd looking page somewhere in the middle. One that I rejected recently were the multiple volumes of Hector Boece's Chronicle. These are all genuine Google scans (books from the university libraries they are known to be scanning) and since they are quite often faulty, I wondered they might be rejects.

Reply to this post
Reply [edit]

Poster: stbalbach Date: Nov 3, 2009 7:12am
Forum: texts Subject: Re: Google rejects?

Google is/was notorious for poor quality scans, in particular in the first few years. There are even entire blogs set up just to show examples of things like peoples fingers in pages etc.. Internet Archive scans have always been much better quality. Google has improved some recently and even begun rescanning some of its previous ones. The Google scans on IA are just copies that users have copied over, they are not rejects.

Reply to this post
Reply [edit]

Poster: Time Traveller Date: Nov 3, 2009 8:23pm
Forum: texts Subject: Re: Google rejects?

so I was correct about us moving stuff over from Google.

and could the bad scans by Google, have been before it began pushing Google Books as a major WWW service? (money maker)

Reply to this post
Reply [edit]

Poster: garthus Date: Nov 3, 2009 8:56pm
Forum: texts Subject: Re: Google rejects?

Peter,

See:

http://www.archive.org/details/Order_To_Show_Cause_With_TRO_New_York_State__763

For one of the many cases which I have been invloved in; this one is returnable November 12 of this year. I will be posting the others when I get the time. (Seems like engineers also make good lawyers) Did this without the help or input of any Lawyers. Wonderful how as the system gets more complicated, those with the knowledge can use it against itself.

Gerry

Reply to this post
Reply [edit]

Poster: garthus Date: Nov 3, 2009 8:39pm
Forum: texts Subject: Re: Google rejects?

Luke,

I probably have looked at thousands of Google scans and a similar number of archive scans. Also put up nearly 900 myself. Statistically Google and Microsoft scans have a very low defect rate considering what materials the scanners were working with and their quality has gotten better over time. My objection concerns their bastardization of the images with watermarks, but that is another story. In any case a bad scan is better than no scan and this can all be worked out over time. If someone wants perfection they have to do what many of us have done, that is to scan the items ourselves. The point here is it is getting better, and with more caring volunteers it will continue to get better.

Gerry

Reply to this post
Reply [edit]

Poster: yadfi Date: Nov 4, 2009 8:25am
Forum: texts Subject: Re: Google rejects?

Thanks for the replies. It is reassuring to find so many dedicated people and to have the Google business cleared up a little. I am still astonished that they originally did such a shoddy job (I find the Microsoft scans to be much better). I work with scanned texts almost exclusively nowadays and a prerequisite is that these texts are scanned well (which is why I do most of my own scanning) because only then can they be OCRed successfully. For me this last stage is the crucial one, because I use a search engine that also indexes all this material and allows for sophisticated (proximity) searches on the basis of which I do my work. It is a great relief to find a huge work scanned by Google but an even greater frustration to then discover the end result is useless for OCRing, so that I have to do it myself anyway. As I said before, this happens again and again. In my case a shoddy job is worse than nothing at all, because I loose a lot of time checking it out only to find it is not as good as it should be.